K-mer Content, Correlation, and Position Analysis of Genome DNA Sequences for the Identification of Function and Evolutionary Features

نویسندگان

Aaron Sievers

Katharina Bosiek

Marc Bisch

Chris Dreessen

Jascha Riedel

Patrick Froß

Michael Hausmann

Georg Hildenbrand

چکیده

In genome analysis, k-mer-based comparison methods have become standard tools. However, even though they are able to deliver reliable results, other algorithms seem to work better in some cases. To improve k-mer-based DNA sequence analysis and comparison, we successfully checked whether adding positional resolution is beneficial for finding and/or comparing interesting organizational structures. A simple but efficient algorithm for extracting and saving local k-mer spectra (frequency distribution of k-mers) was developed and used. The results were analyzed by including positional information based on visualizations as genomic maps and by applying basic vector correlation methods. This analysis was concentrated on small word lengths (1 ≤ k ≤ 4) on relatively small viral genomes of Papillomaviridae and Herpesviridae, while also checking its usability for larger sequences, namely human chromosome 2 and the homologous chromosomes (2A, 2B) of a chimpanzee. Using this alignment-free analysis, several regions with specific characteristics in Papillomaviridae and Herpesviridae formerly identified by independent, mostly alignment-based methods, were confirmed. Correlations between the k-mer content and several genes in these genomes have been found, showing similarities between classified and unclassified viruses, which may be potentially useful for further taxonomic research. Furthermore, unknown k-mer correlations in the genomes of Human Herpesviruses (HHVs), which are probably of major biological function, are found and described. Using the chromosomes of a chimpanzee and human that are currently known, identities between the species on every analyzed chromosome were reproduced. This demonstrates the feasibility of our approach for large data sets of complex genomes. Based on these results, we suggest k-mer analysis with positional resolution as a method for closing a gap between the effectiveness of alignment-based methods (like NCBI BLAST) and the high pace of standard k-mer analysis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Phylogenetic and Evolutionary of Nucleotide Squences of HVR1 region of Mitochondria genom in Goats and Other Livestock Species

Maintaining genomic diversity in goat populations in different parts of Iran is essential for breeding programs, increasing production, survival, resistance to diseases, and various environmental changing conditions. The aim of the present study was to determine the sequence of HVR1 from the mitochondrial genome of Iranian native goats including Sistani, Pakistani, Black and Lorry ecotypes...

متن کامل

Phylogenetic Assessment of Some Species of Crocus Genus Using DNA Barcoding

DNA barcoding is a simple method for the identification of any species using a short genetic sequence from a standard genome section. The present study aimed at examining the nuclear and chloroplast diversity as well as the phylogenetic relationships of eight species of saffron including four spring-flowering and five autumn-flowering species from different parts of Iran, using the nuclear barc...

متن کامل

An Evolutionary and Phylogenetic Study of the BMP15 Gene

DNA sequence data contains a wealth of biologically useful information. Recent innovations in DNA sequencing technology have greatly increased our capacity to determine massive amounts of nucleotide sequences. These sequences can be used to specify the characteristics of different regions, interpret the evolutionary relationships between categorized groups, likelihood of performing multiple com...

متن کامل

Non-alignment comparison of human and high primate genomes

Compositional spectra (CS) analysis based on k-mer scoring of DNA sequences was employed in this study for dot-plot comparison of human and primate genomes. The detection of extended conserved synteny regions was based on continuous fuzzy similarity rather than on chains of discrete anchors (genes or highly conserved noncoding elements). In addition to the high correspondence found in the compa...

متن کامل

Clustering of Short Read Sequences for de novo Transcriptome Assembly

Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 8 شماره

صفحات -

تاریخ انتشار 2017

K-mer Content, Correlation, and Position Analysis of Genome DNA Sequences for the Identification of Function and Evolutionary Features

نویسندگان

چکیده

منابع مشابه

Comparison of Phylogenetic and Evolutionary of Nucleotide Squences of HVR1 region of Mitochondria genom in Goats and Other Livestock Species

Phylogenetic Assessment of Some Species of Crocus Genus Using DNA Barcoding

An Evolutionary and Phylogenetic Study of the BMP15 Gene

Non-alignment comparison of human and high primate genomes

Clustering of Short Read Sequences for de novo Transcriptome Assembly

عنوان ژورنال:

اشتراک گذاری